Listing of Claims: 

1 . (Currently amended) A method in a data processing system for generating and storing 
in a database an entry, the method comprising the steps of: 

generating an entry comprising: 
0 data identifying a molecule; 

ii) data identifying at least one region in the molecule; and 

iii) a set of axes derived from property distribution information of the at least one region, 
the set of axes characterizing the at least one region; 

generating at least one descriptor vector for the at least one region; 

applying a mapping to the at least one descriptor vector associated with the at least one 
region to construct a key based on preselected criteria; and 

storing the entry in a memory, wherein the key is associated with the entry such that the 
key indexes the entry for retrieval thereof. 

In a data proc e ssing syst e m wh e r e in d e scriptor v e ctors associat e d with a plurality of 

r e gions of mol e cules ar e stored in a databas e , a m e thod for g e n e rating and storing data 
charact e rizing at l e ast on e r e gion of said plurality of r e gions, th e m e thod comprising th e st e ps 
e£ 

generating an e ntry comprising i) an id e ntifi e r that id e ntifi e s said at l e ast on e r e gion, and 

ii) data characterizing a s e t of ax e s deriv e d from a property distribution of said at l e ast on e 
r e gion; 

applying a mapping to th e d e scriptor v e ctor associat e d with said at l e ast on e r e gion bas e d 

on pr e s e l e ct e d crit e ria; 

generating a k e y that corr e sponds to said mapping of th e d e scriptor v e ctor associat e d 

with said at l e ast one region; and 

storing said e ntry in a m e mory, wh e r e in said k e y is associat e d with said e ntry such that 

th e k e y index e s th e e ntry for r e tri e val ther e of. 

2. (Cancelled) 
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3. (Cancelled) 



4. (Currently amended) The method of claim 1, wherein said the property 
distribution information of said the at least one region is computed from a convolution with a 
probe function to a property field. 

5. (Currently amended) The method of claim 1, wherein said the at least one 
plurality of descriptor vectors a*e is classified into groups, and wherein said the mapping step 
maps said the at least one descriptor vectors to a space discriminating between said groups of 
descriptor vectors. 

6. (Currently amended) The method of claim 5, wherein said the mapping is derived 
from the steps of: 

generating first data representing differences between said groups of descriptor vectors; 
generating second data representing variations within said groups of descriptor vectors; 

identifying a set of component vectors that maximizes a ratio of variations between 
groups to the variations within groups along the component vectors as a discriminant criterion 
function an F distribut e d crit e rion function, said crit e rion function having a num e rator bas e d 
upon said first data and a d e nominator bas e d upon said s e cond dat a; 

generating a criterion function for subsets of the component vectors, wherein the criterion 
function utilizes the first data and the second data an F distribut e d statistic for subs e ts of said 
compon e nt v e ctors, said statistic having a num e rator bas e d upon said first data and a 
d e nominator bas e d upon said s e cond data ; 

for each particular subset of said-component vectors, calculating a probability value for 
the F distributed statistic criterion functions associated with the particular subset; 

selecting a probability value from probability values for said the subsets of said 
component vectors based upon a predetermined criterion; 
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identifying the subset of said-component vectors associated with the selected probability 
value; and 

generating a mapping to a space corresponding to the subset of said-component vectors 
associated with the selected probability value, and storing the mapping for subsequent 
processing. 

7. (Currently amended) The method of claim 6, wherein said the first data comprises 
a matrix 8b representing covariance between said the groups of descriptor vectors, and said the 
second data comprises a matrix e w representing covariance within said the groups of descriptor 
vectors. 

8. (Currently amended) The method of claim 7, wherein said the criterion function 
has the general form: 

where ^ is some vector, T indicates a transpose, Sh is a first data representing covariance, Sw 
is a second data representing covariance and C is a constant based upon degrees of freedom in 
Sh and 8 W . 

9. (Currently amended) The method of claim 8, wherein the variable C is determined 
as follows: 

_ l/deg]3=esoffiB=dan xie^ _ 1 /(N- 1) 
1/Oegisssoffifflfirm xie w " l /(^ m-N) 
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where N represents the number of groups of descriptor vectors, n * represents the number of 
regions, and I represents the sum of for the N groups. 

10. (Currently amended) The method of claim 7, wherein the step of identifying a set 
of component vectors that maximizes an F distribut e d F-distributed criterion function comprises 
the substeps of: 

determining a set of (eigenvalue, eigenvector) pairs for the matrix 8 W ; and 
determining said the set of component vectors based upon said the set of (eigenvalue, 
eigenvector) pairs for the matrix e w . 

11. (Currently amended) The method of claim 10, wherein said statistic the F- 
distributed statistic for a given subset of component vectors is based upon value of said the 
criterion function for said the subset of component vectors. 

12. (Currently amended) The method of claim 11, wherein said statistic the F- 
distributed statistic for a given subset of component vectors has the following form: 

where A represents the value of the criterion function at a component vector in the given subset, 
C is a constant, Ls represents the number of A values in the given subset of component vectors, 
and the £ operation sums over the Ls fk values in the given subset of component vectors. 
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13. (Currently amended) The method of claim 12, wherein said a the probability 
value for a particular F-distributed statistic represents a probability value that the particular F- 
distributed statistic could have been larger by chance. 

14. (Currently amended) The method of claim 13, wherein said the probability value 
selected from probability values for said the subsets of component vectors is a minimum 
probability value of said the probability values for said the subsets of component vectors. 

15. (Currently amended) The method of claim 6, wherein said the mapping for said 
the at least one descriptor vector performs a loop over each component vector belonging to the 
subset of component vectors associated with the selected probability; 

wherein, in each iteration of said the loop, dot product of said the descriptor vector with a 
transpose of a unit vector for the given component vector is added to a running sum. 

31. (New) The method of claim 1, wherein the at least one descriptor vector is 
invariant to rotation and translation of the at least one region. 

32. (New) The method of claim 31, wherein the set of axes is derived from principal 
axes of second moments of a region of the property distribution information. 

33. (New) The method of claim 6, wherein the probability value is obtained by 
treating the ratio as an F-distributed statistic. 

34. (New) The method of claim 6, wherein the probability value is obtained by any 
one of cross-validation, jack-knife and bootstrap estimations. 
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35. (New) The method of claim 6, wherein application in constructing the 
discriminant criterion function includes boosting and bagging techniques. 
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